Search CORE

7 research outputs found

Empirical analysis of representation learning and exploration in neural kernel bandits

Author: Afkanpour Arash
Lisicki Michal
Taylor Graham W.
Publication venue
Publication date: 09/10/2022
Field of study

Neural bandits have been shown to provide an efficient solution to practical sequential decision tasks that have nonlinear reward functions. The main contributor to that success is approximate Bayesian inference, which enables neural network (NN) training with uncertainty estimates. However, Bayesian NNs often suffer from a prohibitive computational overhead or operate on a subset of parameters. Alternatively, certain classes of infinite neural networks were shown to directly correspond to Gaussian processes (GP) with neural kernels (NK). NK-GPs provide accurate uncertainty estimates and can be trained faster than most Bayesian NNs. We propose to guide common bandit policies with NK distributions and show that NK bandits achieve state-of-the-art performance on nonlinear structured data. Moreover, we propose a framework for measuring independently the ability of a bandit algorithm to learn representations and explore, and use it to analyze the impact of NK distributions w.r.t.~those two aspects. We consider policies based on a GP and a Student's t-process (TP). Furthermore, we study practical considerations, such as training frequency and model partitioning. We believe our work will help better understand the impact of utilizing NKs in applied settings.Comment: Extended version. Added a major experiment comparing NK distribution w.r.t. exploration and exploitation. Submitted to ICLR 202

arXiv.org e-Print Archive

Federated Training of Dual Encoding Models on Small Non-IID Client Datasets

Author: Afkanpour Arash
Eichner Hubert
Green Bradley
Mansfield Philip Andrew
Morningstar Warren Richard
Singhal Karan
Vemulapalli Raviteja
Publication venue
Publication date: 10/04/2023
Field of study

Dual encoding models that encode a pair of inputs are widely used for representation learning. Many approaches train dual encoding models by maximizing agreement between pairs of encodings on centralized training data. However, in many scenarios, datasets are inherently decentralized across many clients (user devices or organizations) due to privacy concerns, motivating federated learning. In this work, we focus on federated training of dual encoding models on decentralized data composed of many small, non-IID (independent and identically distributed) client datasets. We show that existing approaches that work well in centralized settings perform poorly when naively adapted to this setting using federated averaging. We observe that, we can simulate large-batch loss computation on individual clients for loss functions that are based on encoding statistics. Based on this insight, we propose a novel federated training approach, Distributed Cross Correlation Optimization (DCCO), which trains dual encoding models using encoding statistics aggregated across clients, without sharing individual data samples. Our experimental results on two datasets demonstrate that the proposed DCCO approach outperforms federated variants of existing approaches by a large margin.Comment: ICLR 2023 Workshop on Pitfalls of Limited Data and Computation for Trustworthy M

arXiv.org e-Print Archive

BERT for Long Documents: A Case Study of Automated ICD Coding

Author: Adeel Shabir
Afkanpour Arash
Bassani Hansenclever
Cheung Donny
Epshteyn Arkady
Fan Hongbo
Fomitchev Mikhail
Jones Isaac
Kanal Elli
Malihi Mahan
Nauth Adrian
Sinha Raj
Woonna Sanjana
Zamani Shiva
Publication venue
Publication date: 04/11/2022
Field of study

Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this method significantly improves the previous results reported for transformer models in ICD coding, and is able to outperform one of the prominent CNN-based methods

arXiv.org e-Print Archive

Sample adaptive multiple kernel learning for failure prediction of railway points

Author: Afkanpour Arash
Gönen Mehmet
Ishak Muhammad Fitri
Kloft Marius
Le Quoc
Lei Yunwen
Li Xiang
Liu Xinwang
Liu Xinwang
Rakotomamonjy Alain
Shen Yanning
Sonnenburg Sören
Tao Hanqing
Xu Zenglin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/07/2019
Field of study

© 2019 Association for Computing Machinery. Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Meanwhile, they are also one of the most fragile parts in railway systems. Points failures cause a large portion of railway incidents. Traditionally, maintenance of points is based on a fixed time interval or raised after the equipment failures. Instead, it would be of great value if we could forecast points' failures and take action beforehand, min-imising any negative effect. To date, most of the existing prediction methods are either lab-based or relying on specially installed sensors which makes them infeasible for large-scale implementation. Besides, they often use data from only one source. We, therefore, explore a new way that integrates multi-source data which are ready to hand to fulfil this task. We conducted our case study based on Sydney Trains rail network which is an extensive network of passenger and freight railways. Unfortunately, the real-world data are usually incomplete due to various reasons, e.g., faults in the database, operational errors or transmission faults. Besides, railway points differ in their locations, types and some other properties, which means it is hard to use a unified model to predict their failures. Aiming at this challenging task, we firstly constructed a dataset from multiple sources and selected key features with the help of domain experts. In this paper, we formulate our prediction task as a multiple kernel learning problem with missing kernels. We present a robust multiple kernel learning algorithm for predicting points failures. Our model takes into account the missing pattern of data as well as the inherent variance on different sets of railway points. Extensive experiments demonstrate the superiority of our algorithm compared with other state-of-the-art methods

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney